A LAF/GrAF based Encoding Scheme for underspecified Representations of syntactic Annotations

نویسندگان

  • Manuel Kountz
  • Ulrich Heid
  • Kerstin Eckart
چکیده

Data models and encoding formats for syntactically annotated text corpora need to deal with syntactic ambiguity; underspecified representations are particularly well suited for the representation of ambiguous data because they allow for high informational efficiency. We discuss the issue of being informationally efficient, and the trade-off between efficient encoding of linguistic annotations and complete documentation of linguistic analyses. The main topic of this article is a data model and an encoding scheme based on LAF/GrAF (Ide and Romary, 2006; Ide and Suderman, 2007) which provides a flexible framework for encoding underspecified representations. We show how a set of dependency structures and a set of TiGer graphs (Brants et al., 2002) representing the readings of an ambiguous sentence can be encoded, and we discuss basic issues in querying corpora which are encoded using the framework presented here.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A standardized general framework for encoding and exchange of corpus annotations: The Linguistic Annotation Framework, LAF

The Linguistic Annotation Framework, LAF, proposes a generic data model for exchange of linguistic annotations and has recently become an ISO standard (ISO 24612:2012). This paper describes some aspects of LAF, its XML-serialization GrAF and some use-cases related to the framework. While GrAF has already been used as exchange format for corpora with several annotation layers, such as MASC and O...

متن کامل

LAF/GrAF-grounded Representation of Dependency Structures

This paper shows that a LAF/GrAF-based annotation schema can be used for the adequate representation of syntactic dependency structures in many languages. We first argue that there are at least two types of textual units that can be annotated with dependency information: words/tokens and chunks/phrases. Based on this consideration, we discuss a sub-typing of GrAF to represent the corresponding ...

متن کامل

GrAF: A Graph-based Format for Linguistic Annotations

In this paper we describe the Graph Annotation Format (GrAF) and show how it is used represent not only independent linguistic annotations, but also sets of merged annotations as a single graph. To demonstrate this, we have automatically transduced several different annotations of the Wall Street Journal corpus into GrAF and show how the annotations can then be merged, analyzed, and visualized ...

متن کامل

GrAF: A Graph-based Format for Linguistic Annotations

In this paper we describe the Graph Annotation Format (GrAF) and show how it is used represent not only independent linguistic annotations, but also sets of merged annotations as a single graph. To demonstrate this, we have automatically transduced several different annotations of the Wall Street Journal corpus into GrAF and show how the annotations can then be merged, analyzed, and visualized ...

متن کامل

An annotation scheme for Persian based on Autonomous Phrases Theory and Universal Dependencies

A treebank is a corpus with linguistic annotations above the level of the parts of speech. During the first half of the present decade, three treebanks have been developed for Persian either originally or subsequently based on dependency grammar: Persian Treebank (PerTreeBank), Persian Syntactic Dependency Treebank, and Uppsala Persian Dependency Treebank (UPDT). The syntactic analysis of a sen...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008